1 research outputs found
Boosting Object Representation Learning via Motion and Object Continuity
Recent unsupervised multi-object detection models have shown impressive
performance improvements, largely attributed to novel architectural inductive
biases. Unfortunately, they may produce suboptimal object encodings for
downstream tasks. To overcome this, we propose to exploit object motion and
continuity, i.e., objects do not pop in and out of existence. This is
accomplished through two mechanisms: (i) providing priors on the location of
objects through integration of optical flow, and (ii) a contrastive object
continuity loss across consecutive image frames. Rather than developing an
explicit deep architecture, the resulting Motion and Object Continuity (MOC)
scheme can be instantiated using any baseline object detection model. Our
results show large improvements in the performances of a SOTA model in terms of
object discovery, convergence speed and overall latent object representations,
particularly for playing Atari games. Overall, we show clear benefits of
integrating motion and object continuity for downstream tasks, moving beyond
object representation learning based only on reconstruction.Comment: 8 pages main text, 32 tables, 21 Figure